Fritz: All Fritz

home *** CD-ROM | disk | FTP | other *** search

/ Fritz: All Fritz / All Fritz.zip / All Fritz / FILES / PROGSCAL / TINYPASC.LZH / TU.DOC < prev next >

Wrap

Text File | 1986-06-03 | 100KB | 2,047 lines

Pascal in 25 Rules or Less William A. Barrett QCAD Systems, Inc., 1164 Hyde Ave, San Jose, CA 95129 In this article, we're going to design and implement a small subset Pascal compiler, using the Turbo Pascal compiler and a couple of other software tools that you can purchase for a nominal charge. It is capable of translating a Pascal subset program into 8086 symbolic assembly language, which can then be assembled into a running program on an IBM PC. When you look at our tiny Pascal, it may strike you as being so ridiculously simple that it has no useful applications. However, as we'll explain, it provides all the important features of a high-level programming language, and can be extended indefinitely by writing more support functions. By reading this article, or better, by ordering the support software described at the end of this article, you will not only have your own extensible compiler going, but will have learned how language translators and compilers are organized and written. So... carry on, please! Tiny Pascal Features A full Pascal language, like Turbo, has about a hundred features, and a large support library. We're going to choose just a few features, and simplify the language to such an extent that it probably isn't fair to call it Pascal -- it just resembles Pascal in some ways. In particular, we want a description of our language in just twenty-five production rules or less, since the Qparser demo pack supports just that many. Twenty-six rules, and the demo system won't work! Does this sound like a contest you've entered? Anyway, we've managed to pack all this into those twenty-five rules: * All four arithmetic operations in full expressions -- but on 16-bit integers only. * Assignment, IF-THEN-ELSE, BEGIN-END blocks, and WHILE-DO. * Functions with parameters * READ, WRITE, WRITELN with limited string support * Global variables * Integer literals Although our feature list is very short, it's large enough to write some impressive and useful programs. The reason is that tiny Pascal supports functions that can return values. If you need a capability, then you can always write a function to provide that capability. A Tiny Pascal Program A complete program written in tiny Pascal follows this paragraph. You should be able to follow what it does. When you run this through the tiny Pascal compiler, you will get page 1 Pascal in 25 Rules or Less an assembler program that is ready to be assembled and executed on your PC. That program will also contain all the high-level tiny Pascal statements as comments. The fully compiled and assembled listing is given near the end of this article. {TURUN -- A sample program written in Tiny Pascal } var I, J, K, PROBLEM; {*********************} function ISLESS(N1, N2); begin {returns 1 if n1<n2, 0 otherwise} if n2-n1 then isless:=1 {truth value test is >0} else isless:=0; end; function ADDEMUP(LOWER, UPPER, SUM); begin end; {makes it a forward declaration} {*********************} function ISEQUAL(N1, N2); begin if n2-n1 then isequal:=0 {false} else if n1-n2 then isequal:=0 else isequal:=1; end; {***********************} function ADDEMUP(LOWER, UPPER, SUM); {SUM is a local} begin sum:=0; while isless(lower, upper) do begin sum:=sum+lower; lower:=lower+1; end; addemup:=sum+lower; { the last one was left out } end; {*********************} function MAIN(SUM, UPPER); begin i:=1; j:=i+5; k:=j-16; problem:=i+(j*k); writeln('I: ', i, ' J: ', j, ' K: ', k, ' Problem: ', problem); write('Enter upper '); upper:=read; sum:=addemup(1, upper); {sum of integers 1..upper} if isequal(sum, (upper*(upper+1))/2) then writeln('Sum = ', sum) else begin writeln('BUG: Sum = ', sum, '; should be ', (upper*(upper+1))/2); end; end; page 2 Pascal in 25 Rules or Less More About Tiny Pascal Our functions pass parameters by VALUE only -- there's no VAR parameter capability. However, a function can return an integer value, or it can set the value of a global variable. That's a bit awkward sometimes, but it's still better than assembler. There are no Boolean data types as such, but the language uses 0 and 1 as a substitute where necessary. In particular, the expression in an IF or a WHILE tests an integer value for greater-than-zero (true) or less-than-or-equal-to-zero (false). That's good enough. For example, you would write if n1-n2 then ... in tiny Pascal, which is equivalent to if n1-n2>0 then ... or if n1>n2 then ... in full Pascal. The comparison operators can of course be added with more production rules. There are also no local variables in tiny Pascal, but it turns out they really aren't needed. For example, function ADDEMUP(LOWER, UPPER: integer): integer; var SUM: integer; begin ... end; in full Pascal can be written function ADDEMUP(LOWER, UPPER, SUM); begin ... end; in tiny Pascal. Note that we don't bother with the type designators (i.e. : integer) since only one type is supported. Also our compiler is arranged so that if ADDEMUP is called with only two parameter values, SUM is initialized to 0. For example, addemup(5, 10) causes LOWER=5, UPPER=10, and SUM=0 inside ADDEMUP. So you see the extra formal parameters have the same effect as the local variables in full Pascal. Of course, you can also call ADDEMUP with all three parameters, or none -- that's an unsafe defect in tiny Pascal which can't be avoided with our tight production rule limit. You'll just have to be sure to call functions with the correct number of parameters. Functions can be called within expressions, or on a line by themselves. For example, both of these statements are OK: addemup(5, 10); {returned value is ignored} x := y - addemup(5, 10); {returned value is subtracted page 3 Pascal in 25 Rules or Less from y} Literal strings are permitted only within a WRITE or WRITELN statement; they're intended to decorate messages to the standard output device only. If you want strings to appear in expressions and other functions, then you will have to introduce declarations for them and type designators -- that'll take more productions. READ and WRITE Tiny Pascal supports three I/O functions: READ, WRITE and WRITELN. READ is called with no parameters, and returns an integer value entered at the keyboard. WRITE takes any number of integer expressions or strings, and writes them to the console screen in left-to-right order. No line return is emitted at the end, so you can call WRITE several times before sending a line return. WRITELN is the same as WRITE, except that it emits a line return after writing its parameters. Both these can take no parameters if you like -- WRITE does nothing, and WRITELN emits a return. It happens that our integers will be read or written in hexadecimal for simplicity, but you can easily change the STDIO.HDR file to support decimal notation instead. The MAIN Program There must be one function whose name is MAIN in each tiny Pascal program. That'll be the first function called. It may have parameters, but we haven't provided any way of passing parameters to it from the console -- Turbo parameters are strings anyway and would have to be translated into integers somehow. MAIN can of course call any other procedure, and it's through procedure calls and use of the control structures that algorithms can be built up. Recursion and FORWARD Functions Tiny Pascal supports recursion -- a function can call itself any number of times. Each time a function is called, its local variables are pushed onto a runtime stack, along with its return address. A special 8086 register -- the BP register -- is used to mark the position of these variables so that they can be accessed from within a function. We'll explain just how that works later. Sometimes you'll have a procedure A that calls B that calls A. The problem is that A needs a declaration for B. You can supply that in tiny Pascal by writing a function for B with an empty Stmt, like this: function B(P1, P2); begin end; You can even omit the begin end; if you want to. The tiny Pascal compiler will recognize this as a so-called forward declaration. In full Pascal, you'd write it like this: page 4 Pascal in 25 Rules or Less function B(P1, P2: integer): integer; forward; The Grammar Computer languages are described by a context-free grammar, which is just a list of production rules. Here's the grammar for tiny Pascal, all twenty-five rules of it: \ TU.GRM -- tiny Pascal grammar held to 25 productions Goal -> FDeclList FDeclList -> FDeclList FuncDecl ; -> FuncDecl ; FuncDecl -> FUNCTION <identifier> ( ExprList ) ; Stmt #FDECL -> VAR ExprList #VDECL \ Global variables \ ExprList must be identifiers only Stmt -> <identifier> := Expr #ASSIGN -> IF Expr THEN Stmt ELSE Stmt #IFTHEN -> WHILE Expr DO Stmt #WHILEDO -> BEGIN StmtList END #BLOCK -> Expr #SEXPR \ procedure call only! StmtList -> StmtList Stmt ; #STLIST2 -> <empty> Expr -> Expr + Term #ADDOPR -> Expr - Term #SUBOPR -> Term Term -> Term * Primary #MPYOPR -> Term / Primary #DIVOPR -> Primary Primary -> ( Expr ) #PAREN -> <integer> \ only type INTEGER supported -> <string> -> <identifier> \ variable or function call w/o parameters -> <identifier> ( ExprList ) #FUNCP ExprList -> ExprList , Expr #EXPRLIST2 -> Expr #EXPRLIST1 First, a word about the notation. Each rule takes up one line. A backslash starts a comment, which runs to the end of the line. The pound sign (#) marks a production flag, which we'll use as a kind of handle in the compiler's parser. A rule with nothing to the left of the symbol -> is assumed to have the same symbol as the preceding rule. For example, the third rule is written -> FuncDecl ; but actually means FDeclList -> FuncDecl ; The first rule, with the left member Goal, expands into a complete tiny Pascal program. From the second and third rules, a program is a sequence of FuncDecl, separated by semicolons. Each FuncDecl in turn is either a FUNCTION declaration or a VAR declaration, according to the fourth and fifth rules. In fact, we will insist that a program consist only of some page 5 Pascal in 25 Rules or Less VAR declarations (the global variables) followed by the FUNCTION declarations. Each FUNCTION declaration carries a name (the <identifier>), an expression list (with at least one expression), and a statement Stmt. Note that a Stmt can be a list of statements StmtList enclosed in a BEGIN-END pair, so that the body of a function can consist of one, two, three or more statements. There are four kinds of statement, an assignment, an IF-THEN-ELSE, a WHILE-DO, a BEGIN-END block, and an expression -- which in fact must be just a function call. Note that the ELSE must always be present in an IF-THEN-ELSE, unlike full Pascal. Also note that Stmt can be <empty>, i.e. absent. Thus if a-b THEN ELSE BEGIN ... END is a legal statement. For expressions, we support all four arithmetic operations on our integers, as well as parenthesizing. The production rules guarantee that the usual Pascal precedence rules apply, and that things inside parentheses will be evaluated first. The form <integer> stands for any unsigned integer. <string> stands for a Pascal literal string, for example 'Here is a Pascal string' The form <identifier> stands for any legal Pascal name. Inside an expression, the name could stand for a global or local variable, or a function name. As a function name, the function will be called with zero or more parameters, returning some value. Compiler Overview Let's first take a broad look at our compiler, then we'll look at some of the details. We need to look at the forest before we study the trees. (The details are in fact voluminous as you might expect, and are best examined from the source files directly using your favorite editor.) We'll be using a compiler generator tool for the backbreaking part of the job -- our choice is called Qparser, and is available from QCAD Systems, Inc. Qparser accepts the 25-rule grammar we've written above and generates a complete Pascal program from it automatically. All you have to do is run two programs called LR1 and LR1P with the appropriate file names, and you'll get a Turbo Pascal source file. (Specific instructions are included with the product.) Now Qparser only generates a parser -- that's a program that will take a tiny Pascal source file and break it down into small steps. Each step will be a production rule reduce or apply operation. We need to add some more Pascal source code to that `front-end' compiler, to get it to generate assembler code. In particular, the compiler will: page 6 Pascal in 25 Rules or Less 1. Construct an abstract syntax tree (AST) for a whole function. This will be built from unit records allocated from the heap, linked together with pointers in various ways. The AST will carry a complete description of all the variables, statements, expressions, function calls, etc. in the function. 2. When a function AST is complete, its local variables will be added to a symbol table, along with their positions relative to register BP. 3. Then the statements and expressions are turned into 8086 assembly code through a recursive tree-walking procedure called EVAL. EVAL is rather simple to describe, and -- as we shall see -- produces rather tight 8086 code for our integer operations, assignments, function calls, IF-THENs, and WHILE-DOs. It's also easy to understand, extend or modify. The tiny Pascal compiler acts somewhat like a batch compiler -- it collects all the lines for a function, then emits all the code for the function. We've tried to make the resulting assembly code easy to read by copying the function source lines as comments, and by writing the names of local variables as comments. (Local variable references will all appear as offsets from register BP in the assembly code, rather than as names.) Designing the Compiler Once we've written the grammar, we can pass it through Qparser, then try out the automatically-generated parser on a sample program -- such as TURUN given earlier. That will verify that our grammar describes what we expect it to. The next big step is designing compiler data structures to support the AST that will hold all the features of a complete function. It turns out that's easy to do, too -- the Qparser files contains a sample Pascal record structure designed for that purpose, called a SEMREC. Here's what the generic SEMREC looks like as supplied in the Qparser software: type SEMTYPE = (OTHER, IDENT, FIXED, STRNG); SEMRECP = ^semrec; SEMREC = record case SEMT: semtype of { semantic stack structure } ident: (SYMP: symtabp); fixed: (NUMVAL: integer); strng: (STX: integer); {index to string table} end; This supports only three kinds of object: an identifier, a fixed-point integer and a string. Identifiers are actually written into a symbol table -- the symtabp is a pointer to a symbol table record. An integer has a value, NUMVAL. A string is written to a global packed array of char called STRTAB, starting at the index STX, and ending on a chr(0). Note that the type SEMRECP is a pointer to a SEMREC. We're going to build our AST around a tree whose nodes are page 7 Pascal in 25 Rules or Less linked by SEMRECP pointers. Each of the SEMREC structures will be allocated from the Pascal heap as needed, and we'll dispose of them all after we're through with them, at the end of scanning a tiny Pascal function. Arithmetic Operators The AST for the four arithmetic operators is the easiest to grasp. We extend SEMREC to these operators by adding two lines to our SEMREC declaration. The result is: SEMREC = record case SEMT: semtype of { semantic stack structure } ident: (SYMP: symtabp); fixed: (NUMVAL: integer); strng: (STX: integer); {index to string table} addop, subop, mpyop, divop: (LEFT, RIGHT: semrecp); end; Of course, we also need to add the enumerated type names addop, etc. to the SEMTYPE list: SEMTYPE = (OTHER, IDENT, FIXED, STRNG, ADDOP, SUBOP, MPYOP, DIVOP); Let's look at an example of an arithmetic expression and see how it is expressed as an AST: X * (Y-Z) / 15. Figure 1 shows the AST we want. The - operator must be performed first, since we need its result before the * operator, and it in turn is needed before the division / can be performed. This ordering is clearly dictated by an inflexible AST rule: the children of a node must be evaluated before the node itself. The leaves of our tree are identifiers (X, Y, Z) or numbers (15). A number clearly stands for itself, while each identifier stands for some numeric value at runtime. In fact, an identifier will be associated with some memory address whose contents will be fetched in an appropriate instruction. Let's next see how the AST of figure 1 would be described by SEMREC structures linked by pointers. That's shown in figure 2, where each SEMREC is a box, and its LEFT and RIGHT pointers are connected to other SEMREC boxes. The root is a SEMREC whose SEMT=divop. Its LEFT child is another SEMREC whose SEMT=mpyop. The leaves are also SEMREC's, since a SEMRECP pointer must point to a SEMREC, but the leaf nodes are either SEMT=ident or SEMT=fixed. The ident nodes point to a symbol table entry which will carry the identifier name, its type, its address and possible other information. How the AST is Built The AST will be built through parser actions. We don't have the space here to go into the theory of the parsing machine -- that's described in several different references [1, 2]. However, we'll try to explain enough to see how our page 8 Pascal in 25 Rules or Less AST gets built. Let's look at our arithmetic expression again: X * (Y-Z) / 15. When the parser scans this, it will go through the following sequence of production steps: Primary -> <identifier> ; <identifier> = X Primary -> <identifier> ; <identifier> = Y Primary -> <identifier> ; <identifier> = Z Expr -> Expr - Term #SUBOPR ; Y - Z Primary -> ( Expr ) #PAREN ; (Y-Z) Term -> Term * Primary #MPYOPR ; X * (Y-Z) Primary -> <integer> ; 15 Term -> Term / Primary #DIVOPR ; X * (Y-Z) / 15 Expr -> Term We've left out several intermediate steps, but these are the important ones. Essentially, a derivation of our expression is being reconstructed in reverse order -- we start with the expression, match various production right parts to the expression, and eventually end up with a single expression node -- Expr. Notice how the identifiers X, Y and Z are picked up first and `reduced' to a Primary. Then the Y and Z primaries are combined in the SUBOPR production. That agrees with our notion that the subtraction has to come first. The parentheses are covered by the PAREN production; we can then reduce the multiplication with the MPYOPR production. The integer 15 gets scanned next; we didn't need it until now, but it's involved in the division, and finally the division is covered through the DIVOPR production. This sequence of events is the same as if we evaluated the expression with a reverse-Polish calculator, like those manufactured by the Hewlett-Packard Co. We enter X, then enter Y, enter Z, subtract, multiply, enter 15 and divide. On each `enter', a value is pushed into a stack, which saves it for future reference. Each operation is between a number on the stack top and a number just below it; the stack is popped and the result appears on the stack top. Thus when the multiply is performed, it acts on the two operands X and (Y-Z). When the divide is performed, it acts on the operands X*(Y-Z) and 15. Using the Semantics Stack Qparser also provides a stack that works almost like that on a reverse-Polish calculator. The difference is that the Qparser stack can be very deep, and it is keyed to production apply actions. Let us explain what that means -- it's crucial to understanding how our AST is built. Qparser's stack is called SEM. SEM is declared as an array of SEMRECP pointers. The stack top index is named TOS, which increases as more items are pushed onto the stack and decreases as the stack is popped. Thus SEM[TOS]^ points to a SEMREC object on the top of the stack; SEM[TOS-1]^ points to an object just below the stack top, etc. (SEM[n] may also be NIL, which means there is no object.) Now the Qparser software will perform the following page 9 Pascal in 25 Rules or Less services for you in response to scanning its input source: 1. Each tagged production will cause procedure APPLY to be called with an integer parameter that designates the production. The APPLY procedure looks like this: procedure APPLY(PFLAG: integer; var TSEMP: semrecp); begin case PFLAG of ADDOPR: ... ASSIGN: ... BLOCK: ... etc. WHILEDO: ... VDECL: ... end end; Note that PFLAG causes an immediate branch to one of the legs of the CASE statement -- a leg that corresponds to one particular production. 2. When a production such as Term -> Term * Primary is invoked through an APPLY call, the top three SEM stack elements will be aligned with Term, *, and Primary, respectively. SEM[TOS] will point to something associated with Primary, SEM[TOS-1] will be NIL (since * needn't carry information), and SEM[TOS-2] will point to something associated with Term. This is an important concept, and is illustrated in figure 3. Just before our APPLY action, the stack has three elements on its top, which carry pointers to the roots of the trees X and (Y-Z), respectively. (See figure 2 for the whole tree.) 3. In the APPLY procedure, you have an opportunity to do something about the SEM information, and to return a pointer to a SEMREC through the var parameter TSEMP. Consider production Term -> Term * Primary again (figure 3). We assume that SEM[TOS-2] (= Term) points to a tree node, as does SEM[TOS] (= Primary). We want to construct a new SEMREC tree node whose SEMT is MPYOP, whose LEFT will point to Term and whose RIGHT will point to Primary. TSEMP will be assigned to point to this new node. That's easy -- here's the Pascal code fragment needed for the job: MPYOPR: { Term -> Term * Primary } begin new(tsemp); {allocate a new node} with tsemp^ do begin {and fill in its record fields} left:=sem[tos-2]; {points to Term} right:=sem[tos]; {points to Primary} semt:=mpyop; end end; 4. When APPLY returns, the var parameter TSEMP is returned to the parser, which will (a) strip off the top three pointers on the SEM stack then (b) push the TSEMP pointer. Notice that the top three pointers have already page 10 Pascal in 25 Rules or Less been linked to TSEMP, so losing them is no big deal. By pushing TSEMP on the stack, we have effectively associated our new tree root with the Term on the left side of the production Term -> Term * Primary. Refer to figure 4. This is how the stack will look just after returning from APPLY. The top three pointers have been scraped off, and replaced by the TSEMP pointer. We clearly have a larger tree rooted in the stack top -- again look at figure 2 for the whole tree, and you'll see that we've added one node to it. That closes the loop on our operations. Eventually our new Term will show up again in the APPLY procedure, this time as an element in the right part of some production; it will carry the root of a tree, and that tree will get built into something bigger. Certain actions are done for us automatically by the parser and lexical analyzer system of Qparser -- the <identifier>, <integer> and <string> forms are loaded into the SEM stack automatically when and where they are needed. For example, consider the production Stmt -> <identifier> := Expr, tagged ASSIGN in our production set. When this pops up in an APPLY call, there will be a SEMREC pointer in SEM[TOS-2] that corresponds to the <identifier>; it will carry the tag SEMT=ident, and the SYMP pointer will point to a symbol table entry that corresponds to the identifier. Single productions needn't be tagged -- the parser will preserve whatever is attached to the right member, passing it along to the left member. (A single production has exactly one token in its right member, for example Primary -> <identifier>). Also, TSEMP is by default NIL, so if we choose not to set TSEMP, a NIL will be pushed on the stack. Other Tree Nodes Well, we've seen how an arithmetic expression is built up as an AST. The same principle can be applied to any form in our language. Let's look at some others. Consider the assignment production Stmt -> <identifier> := Expr, tagged ASSIGN. We'll use the tag SEMT=assignop for it, and use our friends LEFT and RIGHT as pointers to the <identifier> and Expr. The APPLY code that builds this tree node looks almost the same as for the multiply case: ASSIGNOP: { Stmt -> <identifier> := Expr } begin new(tsemp); {allocate a new node} with tsemp^ do begin {and fill in its record fields} left:=sem[tos-2]; right:=sem[tos]; semt:=assignop; end end; In fact, it looks so much alike that we've created a procedure that deals with all these binary operator cases, called BIN_TREE; this takes one parameter, the SEMT value, and expects to find a LEFT node in SEM[TOS-2] and a RIGHT page 11 Pascal in 25 Rules or Less node in SEM[TOS]. Not all the tree nodes have binary operators. Let's look at an expression list, which is carried by two productions: ExprList -> ExprList , Expr #EXPRLIST2 -> Expr #EXPRLIST1 These don't look quite like our binary operators, but in fact can be handled in a very similar way. We'll create yet another SEMT enumerated type EXPR_LIST with a LEFT and RIGHT. LEFT will point to an expression subtree, but RIGHT will be NIL or point to another EXPR_LIST. Figure 5 shows the general idea for a list of two subtrees. (This structure is borrowed from Lisp -- the LEFT pointer acts like CAR and the RIGHT pointer acts like CDR). The code for these two productions looks like this: EXPRLIST1: { ExprList -> Expr } if not(is_void(sem[tos])) then begin tsemp:=new_sem(expr_list); tsemp^.left:=sem[tos]; end; EXPRLIST2: { ExprList -> ExprList , Expr } if not(is_void(sem[tos])) then begin tsemp:=new_sem(expr_list); tsemp^.left:=sem[tos]; tsemp:=nconc(sem[tos-2], tsemp); end else tsemp:=sem[tos-2]; Some new functions are shown in this code fragment. NEW_SEM allocates a SEMREC from the Pascal heap using NEW, and assigns its SEMT to the passed parameter. More importantly, it also initializes LEFT and RIGHT (or other pointers) to NIL, rather than letting them be garbage. NEW_SEM is always used rather than NEW directly in order to achieve this useful initialization. The IS_VOID function tests its SEMREC parameter for NIL. Finally, NCONC takes two parameters, a list L and an element E. It attaches the element to the end of the list by altering the RIGHT pointer of the last element of the list L to point to the element E. (Again, this is inspired by the Lisp function of the same name). Let's now look at these operations in more detail. The EXPRLIST1 operation must either return NIL or an EXPR_LIST that points to the Expr. It turns out that Expr will never be NIL, but we've written it this way for robustness. The EXPRLIST2 operation looks at the Expr; if it is NIL, then we can simply return the pointer to the ExprList (it, too, may be NIL). If Expr isn't NIL, then we call NCONC to stitch it onto the end of the ExprList, as suggested by figure 5. Function Declaration It's time to consider the function declaration production FuncDecl -> FUNCTION <identifier> ( ExprList ) ; Stmt page 12 Pascal in 25 Rules or Less #FDECL When this pops up in an APPLY action, we have scanned through the function name (the <identifier>), its parameter list (ExprList), and its body (the Stmt). Assuming we've done all our AST building correctly, the ExprList and Stmt are the roots of some trees, possibly very large. The scanning and parsing action have worked through the function declaration completely, but we haven't (1) generated any code, nor (2) done anything about declaring the procedure or its parameters. In fact, we need to perform these actions in this order: 1. Add the function identifier to the symbol table at the global scope level, if it isn't already there. (It could have been previously declared with an empty Stmt part, i.e. as a forward.) 2. Raise the scope level by one. 3. Add the ExprList identifiers to the symbol table, checking each one for multiple declarations. These should each be a single identifier rather than an expression; we can easily test their tree roots for this case and complain about an error. We did it this way to save on productions. Each of the identifiers will be assigned a location in the stack relative to base register BP -- we'll explain how that works shortly. 4. Evaluate the Stmt tree. A recursive procedure is needed for this that will work through the entire statement tree, emitting code appropriate to each of the forms in our language. We'll see that this is much easier than it appears on the surface and is almost trivial for some of the language forms. All we need are some simple translation forms in 8086 assembler and our AST will unfold very neatly into code. During the evaluation phase, every identifier will have a pointer to a symbol table entry which will carry a relative address and a type for the name. That will be useful in generating instructions for the identifier. The Symbol Table We need to describe our symbol table system. There are various ways of organizing symbol tables; we're going to focus on the one used in the Qparser system. User names are associated with the <identifier> form in the grammar. The default <identifier> form is a Pascal name -- something that starts with a letter and continues with letters, digits, or underscores. The first character that isn't in this set stops the name -- it could be a space, a left parenthesis, or whatever. Names will be upper cased as in Pascal. Note that certain keywords look like identifiers, for example, IF, WHILE, BEGIN, etc. The lexical scanner scans one of these as though it were an identifier, then searches the symbol table for it. The keywords have previously been entered and tagged as such, therefore a successful `find' will immediately indicate that the supposed identifier is in fact a keyword. Making this distinction is vital to the page 13 Pascal in 25 Rules or Less parser, since the keywords are important clues to recognition of the program phrases. If the identifier isn't a keyword, it will be entered anyway under an innocuous tag (user). Now that we've discussed how the symbol table works in general, let's look at the symbol table record structure: type SYMBOL = string[maxtoklen]; SYMTYPE = (RESERVED, SYMERR, USER, VAR_TYPE, FUNC_TYPE); SYMTABP = ^symtabtype; SYMTABTYPE = record { structure for <identifier>s and keywords } NEXT: symtabp; LEVEL: int; SYM: symbol; case SYMT: symtype of reserved: (TOKVAL: tokrange); var_type: (VADDR: integer); {relative to BP} func_type: (FADDR: integer; {code location} PBYTES: integer; {bytes of parameters} IS_ACTUAL, {TRUE if an actual declaration has been seen} IS_SYSTEM {system procedure, demands special treatment} : boolean); end; There are five classes of symbol. RESERVED is for the keywords. SYMERR is used during error recovery when the parser needs to make up an identifier for insertion purposes. USER is a generic name attached to user identifiers when they are first seen. VAR_TYPE refers to the global and local variable names -- its VADDR field provides (in the local case only) an offset from BP. FUNC_TYPE refers to a declared function. These are somewhat complicated in our tiny Pascal. FADDR actually isn't used, but could be to refer to a code location. Since we generate symbolic assembly code, we will just produce a symbol procedure label and let the assembler worry about where the function will be located. PBYTES is the number of bytes of formal parameters the function carries. We need this in order to generate an appropriate EXIT instruction, and also to generate calls. IS_ACTUAL will be false until a declaration with a full statement body is seen; after that, IS_ACTUAL will be true. We use this to catch such errors as undeclared functions and two functions with full statement bodies. IS_SYSTEM designates the function as a `system' function, slated for special handling. Our WRITE, WRITELN, READ and a couple of other functions are in this class. These are supported by special assembler routines and aren't called quite the same way as user-declared functions. A symbol name is stuffed in SYM, which is just a string. NEXT carries a link to another SYMTABTYPE record, and is page 14 Pascal in 25 Rules or Less used in a hash table linked-list system for rapid symbol searching. The LEVEL parameter designates the scoping level of the name. Reserved names are carried at level -1. Global names are at level 0 and locals at level 1. In full Pascal, the LEVEL would be incremented by one for every descent into a lexically nested procedure or function. Tiny Pascal supports only one global level of procedures, so LEVEL never exceeds 1. There are several major symbol table actions of interest in tiny Pascal. Reserved names are pushed into an otherwise-empty table as the first action of the compiler, at level -1. This action is essentially automatic by virtue of the Qparser generator. When an <identifier> form is seen, the identifier string is picked up by the lexical analyzer and entered in the table, if not already there. The analyzer also builds a SEMREC structure of type IDENT and pushes it into the SEM stack at the appropriate place. APPLY will therefore `see' an IDENT structure of type USER upon the first appearance of some user identifier. Eventually, we choose to add attributes to the identifier. This will occur when a production appears that reveals what attributes are required. In tiny Pascal, that production is the FuncDecl production FDECL. The action required is to scan through the ExprList, and change the attributes of the names to VAR_TYPE, assigning VADDR values as we go. Correction: if the name doesn't carry a USER tag, it may have been previously declared as a global variable or a function; we need to override the previous use without destroying the previous use. Our symbol table scheme permits doing that by carrying successive names in a first-in-first-out stack organization. Once these names have been declared, there will be no further declarations of names in the function body; only references to names that should previously have been declared. These name references will show up as we scan the AST in the EVAL function. Among other things, we watch out for identifiers carrying a USER tag -- these haven't been declared anywhere and deserve an error complaint. Getting Rid of Unwanted Names When the function body has been fully evaluated and all code generated, we need to get rid of the local variables in the symbol table. A function called CLEARSYM does the trick. It locates all the names in the symbol table whose LEVEL is greater than 0, and removes them from the table. That's important, since the scope of a function's local variables is the lexical range of that function and no further. By doing this, we make it possible to use the same name in different procedures without confusing the compiler or assembler. page 15 Pascal in 25 Rules or Less The Run Time Environment We need to describe how we propose to generate 8086 code from our AST. (If you're not familiar with the 8086, we recommend [3] as a reference.) There are four kinds of code generation problem confronting us: 1) Function declaration -- what assembler forms are needed to open a function and what are needed to close it? 2) Function call -- how are the parameters passed and how is the call generated? 3) Expression and assignment evaluation -- how are the register and memory instructions used to maximum benefit and efficiency? 4) Control structures -- how are the conditional and unconditional branches to be coded? We'll address each of these in turn. Unfortunately, items (1) and (2) depend on each other. Our procedure call strategy must be defined before we can decide on just how a call and a declaration are to be carried out. That will also determine how our local variable addresses are computed. So let's look at function calls first. Function Call Environment Our language is intended to support recursion, which means that a given function may have been called several times before any exit has taken place. Each of these uncompleted calls is called an activation, and each activation requires an activation record that contains a set of local variable values, a return value and a return address. The activation records will be carved out of the 8086's stack, where the stack top is pointed to by register SP. The micro's stack will also be used to hold temporary expression values, so SP will be moving up and down in a somewhat unpredictable fashion. We need a way of marking the position of the activation record that doesn't depend on SP. The current activation record will be pointed to by register BP. Figure 6 shows our activation record design. The stack top is at the bottom, perversely. (Does that mean that we programmers don't know which way is up?) In the 8086, the PUSH operation causes SP to decrease in numeric value -- the stack will start at the end of some segment and work toward its beginning. The figure shows an activation record for a function with three parameters, P1, P2, and P3, in left-to-right order, as it might appear when we're ready to start executing some function body instructions. However, the record got that way partly through call operations and partly through code at the front end of the function. The function call will 1) reserve space for a function return value through a PUSH AX operation, 2) PUSH each parameter in their order of appearance, and then 3) CALL the function. The CALL instruction will push the return address on the stack. At this point, SP will point to the stack top page 16 Pascal in 25 Rules or Less (of course), and BP will be pointing at some previous activation record deeper in the stack. Among the first instructions executed within the function will be PUSH BP MOV BP,SP The first of these pushes the previous value of BP in the stack, and the second sets BP to the new stack top. Notice that parameter P1 is at BP+8 (each stack location is a 2-byte word), P2 is at BP+6, P3 is at BP+4, and the function return value is at BP+10. These offsets can be calculated by the compiler and associated with each of the respective names. Furthermore, the 8086 instruction set and our assembler permit us to use forms like 8[BP] to refer to an address offset from BP by +8 bytes. We can therefore easily generate symbolic address references to any of the function's variables. Function EXIT An EXIT from a function is similarly easy. We need to restore the previous BP value to the one carried at BP in the stack, then return, popping the right number of parameters from the stack. For the configuration of figure 6, we therefore need POP BP RET 8 We overlooked one thing -- since we are dealing with a function, it's important to send the return value back to whatever expression the function was called from. We're going to adopt the convention that every expression will yield its value in the 16-bit AX register. However, our function's return value is in the stack. We therefore need the instruction MOV AX,10[BP] before we POP BP. Then we can remove the parameters and return value from the stack upon a RET instruction. Note that the `10' in this instruction is equal to 4 plus twice the number of parameters of the function. Also the `8' in the RET instruction is 2 plus twice the number of parameters. Function CALL Before we discuss function calling, we need to assert something about the way expressions will be evaluated. We will write a general-purpose procedure called EVAL that takes an expression root (a SEMREC pointer), then generates code such that the arithmetic result of the tree will be left in the AX register. By asserting this first, we can design EVAL to make sure that that is in fact what will happen in every case. We can also use EVAL itself to deal page 17 Pascal in 25 Rules or Less with subtrees, with assurance that AX will receive the result. Of course, it's important that EVAL never change the value of BP or set SP capriciously. It may push some stuff on the stack, but we'll also make sure that whatever is pushed will eventually be popped. Certain other 8086 registers may also be used in very local situations, for example in connection with a multiply or divide instruction. Given that EVAL will take an AST tree and produce code for AX, calling a function will be very easy. In fact, the function call will show up in EVAL itself, whenever we see the SEMT=funcall! Here's what we need to do (refer to figure 6): PUSH AX ; this reserves a return value word eval(P1) ; parameter P1 is evaluated, result to AX PUSH AX ; push result on the stack eval(P2) PUSH AX etc. CALL function We can of course just use the function name in the CALL instruction, since the assembler can figure out where it is. Note that this code sequence will work nicely within an expression, since the result of the CALL is to pop down the stack to exactly the place just before the first PUSH AX, and to yield the function's result in AX. We can also use this sequence for a procedure call, where the return value is ignored. AX will just get lost, and there's no change in the stack. ADD and SUB Let's now turn our attention to evaluating other expressions. We need to take a close look at the 8086 instruction set and decide just how to handle arithmetic. We find that integer addition and subtraction are handled alike, but are coded somewhat differently than multiplication and division. Our ADD or SUB will be between AX and some operand, which may be another register value, an immediate value, a memory value or an indirection through a register. Our AST will look like figure 7. We imagine that the LEFT subtree will somehow yield AX -- that's easy if we just call EVAL on the LEFT subtree. We would then like to emit an ADD or SUB instruction whose operand is the right subtree. Unfortunately, that's not possible if the right subtree is anything other than 1) a literal value, i.e. SEMT=fixed, or 2) a global variable, or 3) a local variable. A global variable can be accessed by its name alone, while a local will be accessed as an offset through BP, e.g. 6[BP]. So before we just go ahead and evaluate the LEFT subtree, we need to test the RIGHT subtree -- is it `simple', i.e., will it fit into an ADD or SUB instruction operand field? If it is, then we can just EVAL(LEFT), then emit the appropriate ADD or SUB instruction. The more difficult case in fact requires that we save the page 18 Pascal in 25 Rules or Less result of evaluating one of the subtrees before evaluating the other one. Since we want the left one in AX in order to emit an instruction, we must evaluate the right one first. That will yield the result in AX, but we musn't just leave it there while evaluating the left subtree -- AX will very likely get changed as a result. Our solution is to PUSH AX before evaluating the LEFT subtree. After that evaluation, AX will hold the left subtree. We choose to POP DX, then emit an ADD or SUB instruction between AX and DX. We therefore arrive at the following compiler coding for the ADD and SUB cases of EVAL: addop, subop: if is_simple(right) then begin eval(left); {goes to AX} code3(opcode(semt), ' AX,', nameof(right)); end else begin eval(right); code('PUSH AX'); {put in stack temporarily} eval(left); {left side to AX} code('POP DX'); {get the right value back from stack} code2(opcode(semt), ' AX,DX'); end; Some unfamiliar functions are used in this code fragment. OPCODE(SEMT) returns a string that represents the 8086 instruction for the SEMT. NAMEOF(root) returns a string that represents a valid operand for the subtree whose root is root -- it may be a literal constant, a named variable, or an offset-BP form. CODE takes one string and formats it as a symbolic assembler line with no label. A space in the string is interpreted as a tab stop, to produce a pretty formatted effect in the assembly code. CODE2 takes two strings, concatenates them and calls CODE with the result. CODE3 and CODE4 are similar. Note that although we pushed a word in the stack in the non-simple case, it gets popped out two lines later. We therefore can assert that EVAL will keep the stack orderly, provided it does so everywhere else. MPY and DIV These operations are somewhat more messy to encode in the 8086. The IMUL instruction requires one operand in the AX register, the other in another register (we will use CX). It returns a result in the pair DX-AX, with AX carrying the least significant word. Without attempting any optimizations, we therefore evaluate the right subtree, PUSH it, evaluate the left, then POP CX and emit the opcode. Division requires one additional step -- DX should be a sign extension of AX before dividing by CX. Fortunately, the instruction CWD performs this for us. Here's the resulting EVAL code fragment: mpyop, divop: begin eval(right); {divisor to AX} page 19 Pascal in 25 Rules or Less code('PUSH AX'); eval(left); {dividend to AX} if semt=divop then code('CWD'); {sign extend into DX} code('POP CX'); code2(opcode(semt), ' CX'); end; Assignment The assignment operation calls for a certain adjustment, depending on whether the right subtree is a literal fixed-point number or not. Here's the EVAL fragment: assignop: begin if right^.semt=fixed then {an immediate on the right is OK} code4('MOVW ', nameof(left), ',', nameof(right)) else begin eval(right); {goes to AX} code3('MOV ', nameof(left), ',AX'); end end; The use of MOVW is required since the left operand will be a name or an indirect BP reference (we can only assign to a variable or a function return value), while the right operand could be a byte or a word. We therefore need a word MOV in this case. In the other case, the right subtree is EVALed as usual, and the right MOV operand will be AX, stamping this as a word operation. Thus we must nurse the assembler along which otherwise wouldn't have the foggiest idea of which operation we intend. WHILE DO The WHILE-DO form is supported by a LEFT pointer and a RIGHT pointer. LEFT points to an expression subtree carrying the boolean expression (it will be tested for >0 = TRUE, and <=0 = FALSE). We have to encode this as 8086 branch instructions, and have no idea how far the branches must reach. Thus we use long branch instructions, hoping that an intelligent assembler can substitute short branches if possible. We need to manufacture a couple of temporary labels, and assign that job to a function NEW_LABEL -- this increments a global label counter, then appends its string value to the string 'XXX'. Thus we create new labels such as XXX1, XXX2, etc. as needed. The function LCODE takes two strings; the first one will become an assembler label, and the second the opcode-operand field. Here's the EVAL code fragment for a WHILE-DO: while_do: begin label1:=new_label; lcode(label1, 'EQU $'); eval(left); {boolean condition} code('CMP AX,0'); label2:=new_label; page 20 Pascal in 25 Rules or Less code2('JLE ', label2); eval(right); {statement or statement list} code2('JMP ', label1); lcode(label2, 'EQU $'); end; Note that we attach a label before evaluating the boolean condition; this will later appear in a JMP around the whole WHILE-DO form back to the boolean test. Following the boolean test is a comparison of AX to 0 (recall that EVAL yields a result in AX). We need the CMP since we aren't sure how AX got loaded, hence whether the sign flag of the 8086 was set correctly. The comparison is followed by a conditional jump to the end of the WHILE-DO form -- note how label2 reappears on an EQU $ as the last coded line. The beauty of EVAL appears between the conditional and the unconditional jump. It's quite capable of dealing with any number of statements, function and procedure calls, etc., so that a quite large and sophisticated sequence of code will appear in that innocent eval(right) call. Yet we drop in a JMP and the labelled EQU $ at just the right place. And -- the correctness of this evaluation speaks for itself. Statement List We've seen how a sequence of expressions is turned into a linked list of SEMREC nodes. We do the same thing with a sequence of statements. An EVAL fragment can be written for a STMTLIST as follows: stmtlist: while root<>nil do begin eval(root^.left); root:=root^.right; end; The ROOT is of course a value parameter passed to EVAL; its LEFT points to some statement and its RIGHT to the rest of the statement list. It's OK in Turbo Pascal to change the value of a passed value parameter, so we just do it. Once a ROOT is evaluated, we don't need it anymore, so we might as well use it to point to the next one. This could also have been coded using a recursive call on EVAL, but sequences of statements might get very lengthy, and could use up lots of stack space before they unwind. IF-THEN-ELSE Since an IF-THEN-ELSE has three parts, we need a special SEMREC clause to deal with it: if_then_else: {if B1 then S1 else S2} (B1, S1, S2: semrecp); This node of an AST is then easily coded as follows. We've discussed all the functions that appear in this EVAL code fragment: page 21 Pascal in 25 Rules or Less if_then_else: begin label1:=new_label; eval(b1); {boolean condition} code('CMP AX,0'); code2('JLE ', label1); eval(s1); {THEN statement} label2:=new_label; code2('JMP ', label2); lcode(label1, 'EQU $'); eval(s2); {ELSE statement} lcode(label2, 'EQU $'); end; Summary The following listing was generated by the Chasm assembler. It is the compiled version of program TURUN given near the beginning of this paper. This is a complete program. It fits into one segment, starting at address 100H. The stack pointer is set to location STACKORG, which is at the end of a 2000 byte block near the end of the program. (This is actually unnecessary since DOS will by default set SP to the end of a segment.) The program then begins with a call to MAIN. When MAIN returns, an INT 020H is executed, which returns control cleanly to DOS. You can step through this program with the DOS DEBUG system if you want to see what's happening in detail. Refer to the DOS manual for details. A few points about this assembled listing: A bug in our version of the Chasm assembler (which I've been assured has been repaired) made it impossible to code the RET instruction with a number; instead we've written the opcode as a DB followed by the number as a DW. You'll see this in line 124 for example. Chasm also noticed four JMP instructions that could be encoded as short jumps. As we've explained, our simple compiler doesn't keep track of the span between a JMP and its target, and that span could be any number of bytes. Thus we've taken the safe approach in our compiler and coded long jumps. An optimizing assembler could turn these into short jumps. Also, our compiler has copied the contents of a support file called STDIO.HDR (not to be confused with the C library of the same name) into the assembly code. STDIO supports the WRITE and READ functions. You can see where this file begins and ends from the comments. Are you impressed? We hope so. If you've followed our development and reasoning this far, you will have noticed that the code generation algorithms for our tiny Pascal statements seem very clear and transparent. Yet when you look at the generated assembly code, it isn't at all clear what's going on. Some of the PUSH AX instructions came from one part of the compiler algorithm and some from another part. But that's the way it is. Many software writers prefer a high-level programming language because a few very clear source statements can turn page 22 Pascal in 25 Rules or Less into a rats nest of assembly code -- but if the compiler generates that code correctly, we don't care how convoluted and hard-to-understand the resulting code is. TURUN.ASM 2/19/86 Page: 1 12:56:04 LOC OBJ LINE SOURCE CHASM version 4.00 0100 1 ; Tiny Pascal assembler code 0100 BC290B 2 MOV SP,OFFSET(STACKORG) 0103 8BEC 3 MOV BP,SP 0105 E81901 4 CALL MAIN 0108 CD20 5 INT 020H 010A 6 ; <STDIO.HDR> included 010A 7 ; STDIO.HDR 010A 8 ; 010A 9 ; READ and WRITE routines needed for Tiny Pascal 010A 10 ; 010A 11 SYS_RCHAR PROC NEAR ; Read single character from 010A B401 12 MOV AH,1 010C CD21 13 INT 021H 010E C3 14 RET ; value comes back in AL 010F 15 ENDP 010F 16 010F 17 SYS_WRCHAR PROC NEAR ; Write a single character ( 010F B402 18 MOV AH,2 0111 CD21 19 INT 021H 0113 C3 20 RET 0114 21 ENDP 0114 22 0114 23 SYS_WHEX PROC NEAR ; Write a single HEX number 0114 80FA0A 24 CMP DL,10 0117 7C07 25 JL SYS_01 0119 80C237 26 ADD DL,55 ; 'A' - 10 011C E8F0FF 27 CALL SYS_WRCHAR 011F C3 28 RET 0120 80C230 29 SYS_01 ADD DL,'0' 0123 E8E9FF 30 CALL SYS_WRCHAR 0126 C3 31 RET 0127 32 ENDP 0127 33 0127 34 SYS_IWRT PROC NEAR ; Write an integer to stdout 0127 B604 35 MOV DH,4 ; used as a counter 0129 D1C0 36 SYS_11 ROL AX 012B D1C0 37 ROL AX 012D D1C0 38 ROL AX 012F D1C0 39 ROL AX 0131 8AD0 40 MOV DL,AL 0133 80E20F 41 AND DL,0FH 0136 50 42 PUSH AX 0137 E8DAFF 43 CALL SYS_WHEX 013A 58 44 POP AX 013B FECE 45 DEC DH 013D 75EA 46 JNZ SYS_11 013F C3 47 RET 0140 48 ENDP 0140 49 page 23 Pascal in 25 Rules or Less 0140 50 SYS_SWRT PROC NEAR ; Write a string terminated 0140 8A5700 51 SYS_21 MOV DL,0[BX] 0143 80FA00 52 CMP DL,0 0146 7501 53 JNZ SYS_22 ; zero terminator? 0148 C3 54 RET 0149 E8C3FF 55 SYS_22 CALL SYS_WRCHAR 014C 43 56 INC BX 014D EBF1 57 JMPS SYS_21 014F 58 ENDP 014F 59 014F 60 SYS_WRTLN PROC NEAR ; write carriage return/lin 014F B20D 61 MOV DL,0DH 0151 E8BBFF 62 CALL SYS_WRCHAR 0154 B20A 63 MOV DL,0AH 0156 E8B6FF 64 CALL SYS_WRCHAR 0159 C3 65 RET 015A 66 ENDP 015A 67 015A 68 READ PROC NEAR ; read a HEX number fr 015A BA0000 69 MOV DX,0 ; clear DX 015D E8AAFF 70 READ_01 CALL SYS_RCHAR ; get one character in 0160 71 ; won't affect DX 0160 3C0D 72 CMP AL,0DH 0162 7506 73 JNZ READ_02 0164 52 74 PUSH DX ; save the thing we've 0165 E8E7FF 75 CALL SYS_WRTLN ; send a carriage retu 0168 58 76 POP AX ; was an ENTER 0169 C3 77 RET 016A 3C20 78 READ_02 CMP AL,' ' 016C 74EF 79 JZ READ_01 ; ignore spaces 016E 2C30 80 SUB AL,'0' ; start conversion to 0170 3C09 81 CMP AL,9 0172 7E02 82 JLE READ_03 0174 2C07 83 SUB AL,7 ; turn 'A' into 0AH 0176 3C0F 84 READ_03 CMP AL,0FH 0178 7E02 85 JLE READ_04 017A 2C20 86 SUB AL,32 ; turn 'a' into 0AH 017C 240F 87 READ_04 AND AL,0FH ; clip for good measur 017E D1E2 88 SHL DX ; prepare DX for hex v 0180 D1E2 89 SHL DX 0182 D1E2 90 SHL DX 0184 D1E2 91 SHL DX 0186 08C2 92 OR DL,AL 0188 EBD3 93 JMPS READ_01 ; go do some more 018A 94 ENDP 018A 95 018A 96 READLN PROC NEAR 018A EBCE 97 JMPS READ ; does the same thing 018C 98 ENDP 018C 99 018C 100 ; ... end of include STDIO.HDR 018C 101 ; {TURUN -- A sample program written in Tiny Pascal 018C 102 ; var I, J, K, PROBLEM; 018C 103 ; 018C 104 ; {*********************} 018C 105 ; function ISLESS(N1, N2); 018C 106 ; begin {returns 1 if n1<n2, 0 otherwise} 018C 107 ; if n2-n1 then isless:=1 {truth value test is page 24 Pascal in 25 Rules or Less 018C 108 ; else isless:=0; 018C 109 ; end; 018C 110 ISLESS PROC NEAR 018C 55 111 PUSH BP 018D 8BEC 112 MOV BP,SP 018F 8B4604 113 MOV AX,4[BP] ; N2 0192 2B4606 114 SUB AX,6[BP] ; N1 0195 3D0000 115 CMP AX,0 0198 7E08 116 JLE XXX0 019A C746080100 117 MOVW 8[BP],1 ; ISLESS **** Diagnostic: Could use JMPS 019F E90500 118 JMP XXX1 01A2 119 XXX0 EQU $ 01A2 C746080000 120 MOVW 8[BP],0 ; ISLESS 01A7 121 XXX1 EQU $ 01A7 8B4608 122 MOV AX,8[BP] ; ISLESS 01AA 5D 123 POP BP 01AB C2 124 DB 0C2H 01AC 0600 125 DW 6 01AE 126 ENDP 01AE 127 ; SYMBOL TABLE 01AE 128 ; ISLESS 8[BP] 01AE 129 ; N1 6[BP] 01AE 130 ; N2 4[BP] 01AE 131 01AE 132 ; 01AE 133 ; function ADDEMUP(LOWER, UPPER, SUM); 01AE 134 ; begin end; {makes it a forward declaration} 01AE 135 ; 01AE 136 ; {*********************} 01AE 137 ; function ISEQUAL(N1, N2); 01AE 138 ; begin 01AE 139 ; if n2-n1 then isequal:=0 {false} 01AE 140 ; else 01AE 141 ; if n1-n2 then isequal:=0 01AE 142 ; else isequal:=1; 01AE 143 ; end; 01AE 144 ISEQUAL PROC NEAR 01AE 55 145 PUSH BP 01AF 8BEC 146 MOV BP,SP 01B1 8B4604 147 MOV AX,4[BP] ; N2 01B4 2B4606 148 SUB AX,6[BP] ; N1 01B7 3D0000 149 CMP AX,0 01BA 7E08 150 JLE XXX2 01BC C746080000 151 MOVW 8[BP],0 ; ISEQUAL **** Diagnostic: Could use JMPS 01C1 E91800 152 JMP XXX3 01C4 153 XXX2 EQU $ 01C4 8B4606 154 MOV AX,6[BP] ; N1 01C7 2B4604 155 SUB AX,4[BP] ; N2 01CA 3D0000 156 CMP AX,0 01CD 7E08 157 JLE XXX4 01CF C746080000 158 MOVW 8[BP],0 ; ISEQUAL **** Diagnostic: Could use JMPS 01D4 E90500 159 JMP XXX5 01D7 160 XXX4 EQU $ 01D7 C746080100 161 MOVW 8[BP],1 ; ISEQUAL 01DC 162 XXX5 EQU $ page 25 Pascal in 25 Rules or Less 01DC 163 XXX3 EQU $ 01DC 8B4608 164 MOV AX,8[BP] ; ISEQUAL 01DF 5D 165 POP BP 01E0 C2 166 DB 0C2H 01E1 0600 167 DW 6 01E3 168 ENDP 01E3 169 ; SYMBOL TABLE 01E3 170 ; ISEQUAL 8[BP] 01E3 171 ; N1 6[BP] 01E3 172 ; N2 4[BP] 01E3 173 01E3 174 ; 01E3 175 ; {***********************} 01E3 176 ; function ADDEMUP(LOWER, UPPER, SUM); 01E3 177 ; {SUM is a local} 01E3 178 ; begin 01E3 179 ; sum:=0; 01E3 180 ; while isless(lower, upper) do begin 01E3 181 ; sum:=sum+lower; 01E3 182 ; lower:=lower+1; 01E3 183 ; end; 01E3 184 ; addemup:=sum+lower; { the last one was left ou 01E3 185 ; end; 01E3 186 ADDEMUP PROC NEAR 01E3 55 187 PUSH BP 01E4 8BEC 188 MOV BP,SP 01E6 C746040000 189 MOVW 4[BP],0 ; SUM 01EB 190 XXX6 EQU $ 01EB 50 191 PUSH AX 01EC 8B4608 192 MOV AX,8[BP] ; LOWER 01EF 50 193 PUSH AX 01F0 8B4606 194 MOV AX,6[BP] ; UPPER 01F3 50 195 PUSH AX 01F4 E895FF 196 CALL ISLESS 01F7 3D0000 197 CMP AX,0 01FA 7E15 198 JLE XXX7 01FC 8B4604 199 MOV AX,4[BP] ; SUM 01FF 034608 200 ADD AX,8[BP] ; LOWER 0202 894604 201 MOV 4[BP],AX ; SUM 0205 8B4608 202 MOV AX,8[BP] ; LOWER 0208 050100 203 ADD AX,1 020B 894608 204 MOV 8[BP],AX ; LOWER **** Diagnostic: Could use JMPS 020E E9DAFF 205 JMP XXX6 0211 206 XXX7 EQU $ 0211 8B4604 207 MOV AX,4[BP] ; SUM 0214 034608 208 ADD AX,8[BP] ; LOWER 0217 89460A 209 MOV 10[BP],AX ; ADDEMUP 021A 8B460A 210 MOV AX,10[BP] ; ADDEMUP 021D 5D 211 POP BP 021E C2 212 DB 0C2H 021F 0800 213 DW 8 0221 214 ENDP 0221 215 ; SYMBOL TABLE 0221 216 ; ADDEMUP 10[BP] 0221 217 ; LOWER 8[BP] 0221 218 ; UPPER 6[BP] 0221 219 ; SUM 4[BP] page 26 Pascal in 25 Rules or Less 0221 220 0221 221 ; 0221 222 ; {*********************} 0221 223 ; function MAIN(SUM, UPPER); 0221 224 ; begin 0221 225 ; i:=1; 0221 226 ; j:=i+5; 0221 227 ; k:=j-16; 0221 228 ; problem:=i+(j*k); 0221 229 ; writeln('I: ', i, ' J: ', j, ' K: ', k, ' Probl 0221 230 ; write('Enter upper '); 0221 231 ; upper:=read; 0221 232 ; sum:=addemup(1, upper); {sum of integers 1..up 0221 233 ; if isequal(sum, (upper*(upper+1))/2) then 0221 234 ; writeln('Sum = ', sum) 0221 235 ; else begin 0221 236 ; writeln('BUG: Sum = ', sum, '; should be ', 0221 237 ; (upper*(upper+1))/2); 0221 238 ; end; 0221 239 ; end; 0221 240 MAIN PROC NEAR 0221 55 241 PUSH BP 0222 8BEC 242 MOV BP,SP 0224 C70653030100 243 MOVW I,1 ; I 022A A15303 244 MOV AX,I ; I 022D 050500 245 ADD AX,5 0230 A35503 246 MOV J,AX ; J 0233 A15503 247 MOV AX,J ; J 0236 2D1000 248 SUB AX,16 0239 A35703 249 MOV K,AX ; K 023C A15703 250 MOV AX,K ; K 023F 50 251 PUSH AX 0240 A15503 252 MOV AX,J ; J 0243 59 253 POP CX 0244 F7E9 254 IMULW CX 0246 50 255 PUSH AX 0247 A15303 256 MOV AX,I ; I 024A 5A 257 POP DX 024B 01D0 258 ADD AX,DX 024D A35103 259 MOV PROBLEM,AX ; PROBLEM 0250 BB4D03 260 MOV BX,OFFSET(SS0) 0253 E8EAFE 261 CALL SYS_SWRT 0256 A15303 262 MOV AX,I ; I 0259 E8CBFE 263 CALL SYS_IWRT 025C BB4803 264 MOV BX,OFFSET(SS1) 025F E8DEFE 265 CALL SYS_SWRT 0262 A15503 266 MOV AX,J ; J 0265 E8BFFE 267 CALL SYS_IWRT 0268 BB4303 268 MOV BX,OFFSET(SS2) 026B E8D2FE 269 CALL SYS_SWRT 026E A15703 270 MOV AX,K ; K 0271 E8B3FE 271 CALL SYS_IWRT 0274 BB3803 272 MOV BX,OFFSET(SS3) 0277 E8C6FE 273 CALL SYS_SWRT 027A A15103 274 MOV AX,PROBLEM ; PROBLEM 027D E8A7FE 275 CALL SYS_IWRT 0280 E8CCFE 276 CALL SYS_WRTLN 0283 BB2B03 277 MOV BX,OFFSET(SS4) page 27 Pascal in 25 Rules or Less 0286 E8B7FE 278 CALL SYS_SWRT 0289 E8CEFE 279 CALL READ 028C 894604 280 MOV 4[BP],AX ; UPPER 028F 50 281 PUSH AX 0290 B80100 282 MOV AX,1 0293 50 283 PUSH AX 0294 8B4604 284 MOV AX,4[BP] ; UPPER 0297 50 285 PUSH AX 0298 B80000 286 MOV AX,0 029B 50 287 PUSH AX 029C E844FF 288 CALL ADDEMUP 029F 894606 289 MOV 6[BP],AX ; SUM 02A2 50 290 PUSH AX 02A3 8B4606 291 MOV AX,6[BP] ; SUM 02A6 50 292 PUSH AX 02A7 B80200 293 MOV AX,2 02AA 50 294 PUSH AX 02AB 8B4604 295 MOV AX,4[BP] ; UPPER 02AE 050100 296 ADD AX,1 02B1 50 297 PUSH AX 02B2 8B4604 298 MOV AX,4[BP] ; UPPER 02B5 59 299 POP CX 02B6 F7E9 300 IMULW CX 02B8 99 301 CWD 02B9 59 302 POP CX 02BA F7F9 303 IDIVW CX 02BC 50 304 PUSH AX 02BD E8EEFE 305 CALL ISEQUAL 02C0 3D0000 306 CMP AX,0 02C3 7E12 307 JLE XXX8 02C5 BB2403 308 MOV BX,OFFSET(SS5) 02C8 E875FE 309 CALL SYS_SWRT 02CB 8B4606 310 MOV AX,6[BP] ; SUM 02CE E856FE 311 CALL SYS_IWRT 02D1 E87BFE 312 CALL SYS_WRTLN **** Diagnostic: Could use JMPS 02D4 E92D00 313 JMP XXX9 02D7 314 XXX8 EQU $ 02D7 BB1803 315 MOV BX,OFFSET(SS6) 02DA E863FE 316 CALL SYS_SWRT 02DD 8B4606 317 MOV AX,6[BP] ; SUM 02E0 E844FE 318 CALL SYS_IWRT 02E3 BB0B03 319 MOV BX,OFFSET(SS7) 02E6 E857FE 320 CALL SYS_SWRT 02E9 B80200 321 MOV AX,2 02EC 50 322 PUSH AX 02ED 8B4604 323 MOV AX,4[BP] ; UPPER 02F0 050100 324 ADD AX,1 02F3 50 325 PUSH AX 02F4 8B4604 326 MOV AX,4[BP] ; UPPER 02F7 59 327 POP CX 02F8 F7E9 328 IMULW CX 02FA 99 329 CWD 02FB 59 330 POP CX 02FC F7F9 331 IDIVW CX 02FE E826FE 332 CALL SYS_IWRT 0301 E84BFE 333 CALL SYS_WRTLN 0304 334 XXX9 EQU $ page 28 Pascal in 25 Rules or Less 0304 8B4608 335 MOV AX,8[BP] ; MAIN 0307 5D 336 POP BP 0308 C2 337 DB 0C2H 0309 0600 338 DW 6 030B 3B2073686F75 339 SS7 DB '; should be ',0 6C6420626520 00 0318 4255473A2053 340 SS6 DB 'BUG: Sum = ',0 756D203D2000 0324 53756D203D20 341 SS5 DB 'Sum = ',0 00 032B 456E74657220 342 SS4 DB 'Enter upper ',0 757070657220 00 0338 2050726F626C 343 SS3 DB ' Problem: ',0 656D3A2000 0343 204B3A2000 344 SS2 DB ' K: ',0 0348 204A3A2000 345 SS1 DB ' J: ',0 034D 493A2000 346 SS0 DB 'I: ',0 0351 347 ENDP 0351 348 ; SYMBOL TABLE 0351 349 ; MAIN 8[BP] 0351 350 ; SUM 6[BP] 0351 351 ; UPPER 4[BP] 0351 352 0351 353 ; 0351 354 ; GLOBAL VARIABLES 0351 0000 355 PROBLEM DW 0 0353 0000 356 I DW 0 0355 0000 357 J DW 0 0357 0000 358 K DW 0 0359 359 ; RUNTIME STACK 0359 360 DS 2000 0B29 0000 361 STACKORG DW 0 0B2B 362 ; MAIN stack space 0B2B 0000 363 DW 0 0B2D 0000 364 DW 0 0B2F 0000 365 DW 0 0B31 366 ; NO errors 0 Error(s) detected 5 Diagnostic(s) offered 817 (331H) Bytes of object code generated Symbol Table Dump: ADDEMUP.........P01E3 I...............M0353 ISEQUAL.........P01AE ISLESS..........P018C J...............M0355 K...............M0357 MAIN............P0221 PROBLEM.........M0351 READ............P015A READLN..........P018A READ_01.........P015D READ_02.........P016A READ_03.........P0176 READ_04.........P017C SS0.............M034D SS1.............M0348 SS2.............M0343 SS3.............M0338 SS4.............M032B SS5.............M0324 SS6.............M0318 SS7.............M030B STACKORG........M0B29 SYS_01..........P0120 SYS_11..........P0129 SYS_21..........P0140 SYS_22..........P0149 page 29 Pascal in 25 Rules or Less SYS_IWRT........P0127 SYS_RCHAR.......P010A SYS_SWRT........P0140 SYS_WHEX........P0114 SYS_WRCHAR......P010F SYS_WRTLN.......P014F XXX0............P01A2 XXX1............P01A7 XXX2............P01C4 XXX3............P01DC XXX4............P01D7 XXX5............P01DC XXX6............P01EB XXX7............P0211 XXX8............P02D7 XXX9............P0304 Where to Obtain Software We invite you to try writing your own compiler, or at least to try the tools and source code we used in writing this one. Here's what you need -- you'll find the cost very reasonable: The Qparser demo system. Order from QCAD Systems, 1164 Hyde Ave, San Jose, CA 95129, phone 800/538-9787 (408/727-6671 in CA). Price is $10 plus $2 shipping and handling. CA residents add sales tax. Check, money order, or bank card accepted. This comes with a booklet that carries you through a simple pocket calculator example, but contains the all-important parser generator needed for our tiny Pascal compiler. If you want to go beyond 25 productions, the unlimited Qparser system will cost you $400. Source for the tiny Pascal compiler. Order from QCAD Systems, same address and phones as above. Ask for the "tiny Pascal disc": $10 plus $2 shipping and handling, plus sales tax, etc. The Chasm assembler. You can purchase an abridged version of this assembler (but powerful enough to support tiny Pascal) from PC Software Interest Group, 1030 E. Duane, suite D, Sunnyvale, CA 94086. Call 408/730-9291 for membership and ordering information. Ask for disk number 10. A membership in PC-SIG is $20 for one year, and entitles you to any number of software discs at $6 each from their catalog of over 540 discs. You can also order a full Chasm assembler for $40 directly from the author, David Whitman, 136 Wellington Terrace, Lansdale, PA, 19446, phones 215/641-7114 (day), 215/362-8526 (evenings). Tiny Pascal was written to be compatible with Chasm, but it can easily be adapted to other assembler conventions. The Qparser demo system disc can also be ordered from PC-SIG for $6 -- ask for disk number 419. (This version has the booklet information on the disc as a README file.) References [1] Compiler Construction--Theory and Practice, Barrett, Bates, Gustafson and Couch, Science Research Associates, Chicago, second edition. This is a companion textbook to the Qparser system. This and reference [2] develop the theory of grammars, parsers, translation, symbol table management and much more. [2] Principles of Compiler Design, Aho and Ullman, page 30 Pascal in 25 Rules or Less Addison-Wesley. [3] 8086 Family User's Manual, Intel Corp, 9800722-03. Order from the Literature Dept, Intel Corp, 3065 Bowers Ave, Santa Clara, CA, 95051. A definitive volume on the 8086 architecture, development systems, software and related semiconductor products. page 31